Enhancing Synthetic Speech with Filled Pauses
نویسندگان
چکیده
Filled pauses are generally seen as disfluency. Showing that they can play a positive role in computer generated speech is the purpose of this paper. Details about the causes, variation in pronunciation and syntactic location are acquired through a corpus analysis. This knowledge is then used in an experiment conducted to test positive effects of filled pauses in computer generated speech on listeners. The results show they can influence how well a listener remembers what is said after the filled pause as well as the perceived friendliness and honesty of the speaker and the naturalness of the utterance. This research hopes to contribute to the integration of filled pauses into speech enabled applications. Preface Computer agents are designed to mimic human behavior. The study of human interaction allows us to enhance the level of realism of these agents. Studying human speech to improve computer synthesized speech is a perfect example of this. Unit selection synthesizers can already produce very realistic sounding speech through use of recordings of actual human speech. However, there is more to spoken language than the sound of the messages: the paralanguage. The initial assignment which led to this thesis spoke of laughter, crying and other elements of speech used to express an emotion. The choice of what element I would focus on, however, was left up to me. The next step was to transcribe a corpus to gain an idea of what elements I could choose from and how frequently they occurred. Something became very obvious even before the transcription had been completed. The transcription was littered with occurrences of “um” and “uh”. Their prevalence intrigued me as I set out to investigate these so called “filled pauses”. Although technically filled pauses are contained in many English dictionaries as interjections, making them actual words unlike laughter and crying, their role in speech goes beyond that of an interjection. Their classification is the subject of active debate. As these elements are not very common in speech synthesis, whether or not this direction was in accordance with the assignment’s initial premise is a moot point. The first goal of the research was to discover the locations of these elements, their frequency, the reason for their presence, their function in the dialogue and their phonetic properties. This information was used to in an experiment conducted to achieve the second goal of this research, which is to determine the effects of filled pauses in computer generated speech on listeners. I’d like to take the time to thank my family who supported me through this sometimes frustrating project, my supervisors who provided me with feedback and counsel, and finally I would like to thank all those who took the time to participate in the experiment.
منابع مشابه
Synthesising Filled Pauses: Representation and Datamixing
Filled pauses occur frequently in spontaneous human speech, yet modern text-to-speech synthesis systems rarely model these disfluencies overtly, and consequently they do not output convincing synthetic filled pauses. This paper presents a text-to-speech system that is specifically designed to model these particular disfluencies more efffectively. A preparatory investigation shows that a synthet...
متن کاملDisfluencies in Change Detection in Natural, Vocoded and Synthetic Speech
In this paper, we investigate the effect of filled pauses, a discourse marker and silent pauses in a change detection experiment in natural, vocoded and synthetic speech. In natural speech change detection has been found to increase in the presence of filled pauses, we extend this work by replicating earlier findings and explore the effect of a discourse marker, like, and silent pauses. Further...
متن کاملThe effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech
It has been shown that in natural speech filled pauses can be beneficial to a listener. In this paper, we attempt to discover whether listeners react in a similar way to filled pauses in synthetic and vocoded speech compared to natural speech. We present two experiments focusing on reaction time to a target word. In the first, we replicate earlier work in natural speech, namely that listeners r...
متن کاملFilled Pauses in Speech Synthesis: Towards Conversational Speech
Speech synthesis techniques have already reached a high level of naturalness. However, they are often evaluated on text reading tasks. New applications will request for conversational speech instead and disfluencies are crucial in such a style. The present paper presents a system to predict filled pauses and synthesise them. Objective results show that they can be inserted with 96% precision an...
متن کاملSynthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies
As synthetic voices become more flexible, and conversational systems gain more potential to adapt to the environmental and social situation, the question needs to be examined, how different modifications to the synthetic speech interact with each other and how their specific combinations influence perception. This work investigates how the vocal effort of the synthetic speech together with adde...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007